This document represents my typical approach when receiving a new dataset. I believe in meticulous documentation and exhaustive analysis. I understand in the missionary department often the questions asked require quick responses and that both documentation and in-depth analysis become afterthoughts (though I do remember Matt being very organized and great at saving our work in a well documented manner). I have asked the data science intern I mentor/host to pursue the creation of a R packag to semi-automatically generate documentation following the pattern observed below.
Each new dataset provided an analyst requires a simple, quick, but nearly exhaustive data discovery analysis. Visualization is usually the best method for such an endeavor. The goal of the initial data discovery phase would be to:
Sections below show some quick observations, which seemlessly become initial documentation of the dataset. This document required 6 hours of work, but should be the building block of developing a framework for reproducible initial investigation of similar datasets.
Clearly a mission president effect is observed, note the color changes and regression lines by mission president for each mission.
However we should confirm that the mission president effect isn’t overly influenced/confounded by the number of missionaries. The plot below confirms that the mission president has a potentially strong effect on the number of new investigators within a mission even after accounting for the number of missionaries.
While new investigators represent the top of the funnel, baptisms/confirmations (the bottom of the funnel) also show a mission president effect.
Just to be thorough, while skeptical of a mission president effect on sacrament meeting attendance it appears that such an effect is real. It seems any analysis of missionary data will require attention be paid to a potential mission president effect.
Outliers are possible and we would want to be careful about their effect on predictions/goals. Potential outliers identified here include:
All identified potential outliers were removed for subsequent data processing and analysis. There are likely other less obvious outliers that require attention.
I failed to correct any outliers in the “goal” KPIs which definitely affect the area average calculations below.
Knowing that all missions within an area are not homogeneous makes it difficult to make mission-to-mission comparisons. A simple comparison though could be made between each mission and the area average, though that may not always be appropriate. While one might prefer to compare the mission to it’s historical behavior/average, comparison to the area average accounts for some general or seasonal trending and helps appropriately illustrate mission-to-mission variability.
One may wish to smooth out the area averages (likely using a moving average) to assure that inherent variability doesn’t distract the analyst/viewer from observing the general trend, which was not done here.
A quick introduction to trelliscopejs is necessary. Trelliscopejs is a R package that like Tableau enables the audience to interact visually with the data. Because trelliscopejs is built for use in R, we can leverage other important tools to mimic Tableau’s interactivity (in this case a JavaScript vis library called bokeh) and go well beyond Tableau’s limits by moving visually through various slices of the data using features derived directly from the data. The example below should help illustrate the advantages of such an approach, thus enabling the analyst to observe the data in ways not easily obtained using other business intelligence (BI) tools. BI reporting tools, e.g. Tableau and Business Objects, can then be leveraged for customized reporting based on the outcomes/results/observations generated via the interactive analysis.